9  Libraries

Author

Vladimir Buskin

9.2 Working with packages in R

Packages expand the basic functionality of R by providing numerous quality-of-life improvements that not only considerably simplify common data wrangling tasks but which also provide frameworks for state-of-the-art methods for statistical analysis and natural language processing (NLP), among many other things.

9.2.1 Installation

How do I install a library?

Navigate to Packages > Install and verify that the pop-up window says Install from: Repository (CRAN). You can now type in the name of the package you would like to install under Packages.

Video tutorial on YouTube

This reader will use functions from a variety of R packages. Please install the following ones:

  • quanteda (for the analysis of text data)

  • tidyverse (a framework for data manipulation and visualisation)

  • readxl (for importing Microsoft Excel files)

  • writexl (for exporting Microsoft Excel files)

  • crosstable (for creating contingency tables)

  • flextable (for exporting contingency tables)

9.2.2 Loading packages

Once the installation has been completed, you can proceed to load the libraries using the code below. You can ignore the warning messages.

library(quanteda)
library(tidyverse)
library(readxl)
library(writexl)
library(crosstable)
library(flextable)
Activating libraries

Whenever you start a new R session (i.e., open RStudio), your libraries and their respective functions will be inactive. To re-activate a library, either use the library() function or simply select it in the Packages tab.

It is good practice to only activate those packages that are necessary for your analysis. While it won’t be a problem for the small set of packages as shown here, loading dozens of packages increases the risk of obtaining “homonymous” functions which have the same name but perform different operations. In this case, it might be helpful to “disambiguate” them by directly indicating which package a function is from:

readxl::read_xlsx(...)

9.2.3 Citing packages

Whenever we draw on ideas other than our own, we give credit to the respective source by citing it appropriately. The same applies to R, RStudio as well as all the packages we rely on throughout our analyses.

For R, an up-to-date citation can be generated as follows:

citation()

To cite R in publications use:

  R Core Team (2023). R: A language and environment for statistical
  computing. R Foundation for Statistical Computing, Vienna, Austria.
  URL https://www.R-project.org/.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2023},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

To cite a specific package, simply supply the package name as an argument.

citation("quanteda")

To cite package 'quanteda' in publications use:

  Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A
  (2018). "quanteda: An R package for the quantitative analysis of
  textual data." _Journal of Open Source Software_, *3*(30), 774.
  doi:10.21105/joss.00774 <https://doi.org/10.21105/joss.00774>,
  <https://quanteda.io>.

A BibTeX entry for LaTeX users is

  @Article{,
    title = {quanteda: An R package for the quantitative analysis of textual data},
    journal = {Journal of Open Source Software},
    author = {Kenneth Benoit and Kohei Watanabe and Haiyan Wang and Paul Nulty and Adam Obeng and Stefan Müller and Akitaka Matsuo},
    doi = {10.21105/joss.00774},
    url = {https://quanteda.io},
    volume = {3},
    number = {30},
    pages = {774},
    year = {2018},
  }

Since it would be quite tedious to do this for every single package we’ve currently loaded, there are a few more elegant solutions in place.

sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] flextable_0.9.4  crosstable_0.7.0 writexl_1.4.2    readxl_1.4.3    
 [5] lubridate_1.9.2  forcats_1.0.0    stringr_1.5.0    dplyr_1.1.4     
 [9] purrr_1.0.1      readr_2.1.4      tidyr_1.3.0      tibble_3.2.1    
[13] ggplot2_3.5.0    tidyverse_2.0.0  quanteda_3.3.1  

loaded via a namespace (and not attached):
 [1] jsonlite_1.8.8          RcppParallel_5.1.7      shiny_1.7.4.1          
 [4] askpass_1.1             fontLiberation_0.1.0    cellranger_1.1.0       
 [7] yaml_2.3.7              gdtools_0.3.3           pillar_1.9.0           
[10] backports_1.4.1         lattice_0.21-8          glue_1.6.2             
[13] uuid_1.1-0              digest_0.6.33           promises_1.2.0.1       
[16] checkmate_2.2.0         colorspace_2.1-0        htmltools_0.5.7        
[19] httpuv_1.6.11           Matrix_1.6-5            gfonts_0.2.0           
[22] fontBitstreamVera_0.1.1 pkgconfig_2.0.3         httpcode_0.3.0         
[25] xtable_1.8-4            scales_1.3.0            later_1.3.1            
[28] officer_0.6.2           fontquiver_0.2.1        tzdb_0.4.0             
[31] openssl_2.1.0           timechange_0.2.0        generics_0.1.3         
[34] ellipsis_0.3.2          withr_2.5.0             cli_3.6.2              
[37] magrittr_2.0.3          crayon_1.5.2            mime_0.12              
[40] evaluate_0.21           stopwords_2.3           fansi_1.0.4            
[43] xml2_1.3.6              textshaping_0.3.6       tools_4.2.3            
[46] data.table_1.14.8       hms_1.1.3               lifecycle_1.0.3        
[49] munsell_0.5.0           zip_2.3.0               compiler_4.2.3         
[52] systemfonts_1.0.4       rlang_1.1.2             grid_4.2.3             
[55] rstudioapi_0.15.0       htmlwidgets_1.6.4       rmarkdown_2.23         
[58] gtable_0.3.3            curl_5.2.1              R6_2.5.1               
[61] knitr_1.43              fastmap_1.1.1           utf8_1.2.3             
[64] fastmatch_1.1-4         ragg_1.2.5              stringi_1.8.3          
[67] crul_1.4.0              Rcpp_1.0.11             vctrs_0.6.5            
[70] tidyselect_1.2.0        xfun_0.39              

An excellent package that specialises in reporting the output of statistical analyses, including the packages used, is report.

library(report)

cite_packages()
  - Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). "quanteda: An R package for the quantitative analysis of textual data." _Journal of Open Source Software_, *3*(30), 774. doi:10.21105/joss.00774 <https://doi.org/10.21105/joss.00774>, <https://quanteda.io>.
  - Chaltiel D (2023). _crosstable: Crosstables for Descriptive Analyses_. R package version 0.7.0, <https://CRAN.R-project.org/package=crosstable>.
  - Gohel D, Skintzos P (2023). _flextable: Functions for Tabular Reporting_. R package version 0.9.4, <https://CRAN.R-project.org/package=flextable>.
  - Grolemund G, Wickham H (2011). "Dates and Times Made Easy with lubridate." _Journal of Statistical Software_, *40*(3), 1-25. <https://www.jstatsoft.org/v40/i03/>.
  - Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023). "Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption." _CRAN_. <https://easystats.github.io/report/>.
  - Müller K, Wickham H (2023). _tibble: Simple Data Frames_. R package version 3.2.1, <https://CRAN.R-project.org/package=tibble>.
  - Ooms J (2023). _writexl: Export Data Frames to Excel 'xlsx' Format_. R package version 1.4.2, <https://CRAN.R-project.org/package=writexl>.
  - R Core Team (2023). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>.
  - Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer-Verlag New York. ISBN 978-3-319-24277-4, <https://ggplot2.tidyverse.org>.
  - Wickham H (2022). _stringr: Simple, Consistent Wrappers for Common String Operations_. R package version 1.5.0, <https://CRAN.R-project.org/package=stringr>.
  - Wickham H (2023). _forcats: Tools for Working with Categorical Variables (Factors)_. R package version 1.0.0, <https://CRAN.R-project.org/package=forcats>.
  - Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." _Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
  - Wickham H, Bryan J (2023). _readxl: Read Excel Files_. R package version 1.4.3, <https://CRAN.R-project.org/package=readxl>.
  - Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A Grammar of Data Manipulation_. R package version 1.1.4, <https://CRAN.R-project.org/package=dplyr>.
  - Wickham H, Henry L (2023). _purrr: Functional Programming Tools_. R package version 1.0.1, <https://CRAN.R-project.org/package=purrr>.
  - Wickham H, Hester J, Bryan J (2023). _readr: Read Rectangular Text Data_. R package version 2.1.4, <https://CRAN.R-project.org/package=readr>.
  - Wickham H, Vaughan D, Girlich M (2023). _tidyr: Tidy Messy Data_. R package version 1.3.0, <https://CRAN.R-project.org/package=tidyr>.
Winter, Bodo. 2020. Statistics for Linguists: An Introduction Using r. New York; London: Routledge.